The competition of speech recognition technology related to smartphones is now getting\ninto full swing with the widespread internet of thing (IoT) devices. For robust speech recognition, it is\nnecessary to detect speech signals in various acoustic environments. Speech/music classification that\nfacilitates optimized signal processing from classification results has been extensively adapted as an\nessential part of various electronics applications, such as multi-rate audio codecs, automatic speech\nrecognition, and multimedia document indexing. In this paper, we propose a new technique to\nimprove robustness of a speech/music classifier for an enhanced voice service (EVS) codec adopted\nas a voice-over-LTE (VoLTE) speech codec using long short-term memory (LSTM). For effective\nspeech/music classification, feature vectors implemented with the LSTM are chosen from the\nfeatures of the EVS. To overcome the diversity of music data, a large scale of data is used for\nlearning. Experiments show that LSTM-based speech/music classification provides better results\nthan the conventional EVS speech/music classification algorithm in various conditions and types of\nspeech/music data, especially at lower signal-to-noise ratio (SNR) than conventional EVS algorithm.
Loading....